Claudia Mueller, University of Stuttgart, cmueller@sonivis.org
[PRIMARY contact]
Lukas Birn, Capgemini sd&m AG,
lukas.birn@capgemini-sdm.com
The open source
programming language Processing [1] is applied to create the visual model.
Processing combines software concepts with principles of visual form and
interaction. It is a text programming language developed to generate and modify
images. Eclipse [2] is utilized as a convenient programming environment. It is
planned to integrate this visual model with the open source software SONIVIS
[3].
The implemented
visualization is designed as a monitoring instrument. It allows an interactive
exploration of data which is displayed in one view. The data provided is
directly imported.
The employees
are placed on the vertical axis in descending order and the timeline is
arranged on the horizontal axis. A green fading bar visualizes the entry of
employees into the embassy. A red bar depicts the attendance of an employee in
the restricted area. White bars visualize traffic on an employee’s computer.
The request and response payloads are displayed as sparklines. Positive
amplitude of white bars indicates the request payload and negative amplitude,
the response payload. We defined three rule violations. The first rule
violation “piggybacking into/out of restricted area” is displayed by a yellow
corona; the second rule violation “unattended network access” is represented by
an orange corona; and the third rule violation “access to suspicious IP” is
shown by a pink corona. Semitransparent green boxes highlight the existing
alibis of users. Based on the defined rule violations and the alibis, we detect
one employee as the suspect. A semitransparent white box highlights this
employee.
References
[1]
http://processing.org/
[2]
http://eclipse.org
[3]
http://www.sonivis.org
Video:
You will find a
short video presentation of the tool here.
ANSWERS:
MC1.1: Identify which computer(s)
the employee most likely used to send information to his contact in a
tab-delimited table which contains for each computer identified: when the
information was sent, how much information was sent and where that information
was sent.
MC1.2: Characterize the
patterns of behavior of suspicious computer use.
Our analytical
process consists of six phases: acquire, parse, represent, reasoning, refine,
and interact (cp. Figure 1). In the following sections, we briefly explain the
first three phases which are carried out by the software solution and focus on
the last four steps.
Figure 1: Analytical reasoning
process, high resolution here available.
Phases: Acquire and Parse
All available
data from the Mini Challenge 1-Badge and Network Traffic is downloaded. The
developed software parses all data sets (proximity card logs and network
traffic logs). The description of the software can be found in the section
“Description of the technical solution”.
Phase: Represent
Firstly, we
define requirements on the visual model. One view should contain all necessary
information. The visualization should allow an interactive exploration of data
and it should be reusable as a monitoring instrument for similar questions.
Secondly, an
appropriate visualization library is needed to present the available data. We
decided to use the open source programming language Processing because of its
low entry level and the possible integration with the visual analytics software
SONIVIS.
Thirdly, the
basic visual model is specified. The time-dependent employee_s activity data
should be as simple as possible and all information should be quickly grasped.
Finally, the
initial visual model (v0.5) is programmed, containing only the proximity data
logs. Employees are organized on the vertical axis in descending order. The
timeline is arranged on the horizontal axis. There are three types of events:
prox-in-building, prox-in-classified, and prox-out-classified. When an employee
enters the embassy, a green fading bar is used to show the “fuzziness” of this
event, and a red bar depicts the attendance of an employee in the restricted
area. If an employee leaves the restricted area and there is no
“prox-in-classified” event, or he enters without a previous departure, then a
rule violation exists. This rule violation “piggybacking into/out of restricted
area” is displayed by a yellow corona (cp. Figure 2).
Figure 2: Visual Model (v0.7), high
resolution here available.
Phase: Reasoning I
The visual
model reveals three employees, nos. 30, 38 and 49, who go against policy and
piggyback (enter or leave the restricted area without badging in or out by
following a co-worker who did badge in or out).
However, the
available information is not sufficient to identify the suspicious person. The
network traffic logs are therefore integrated in the visual model.
Phase: Refine I
The initial
visual model is enhanced to reveal the target person. The IP traffic data
contains the sizes of the request and the response in bytes, the port, the
source IP address and the destination IP address. A white bar visualizes
traffic on an employee’s computer. The request and response payload are
displayed similar to sparklines. Positive amplitude of the white bar indicates
the request, and negative amplitude, the response payload.
Phase: Reasoning II
This version of
the visual model (v0.8) shows the employees’ arrival at the embassy, their
attendance in the restricted area and the activity of their computer, including
the request and response size.
For the
following analytical process, two assumptions are defined. Firstly, we presume
that only one person from the embassy is suspected of sending data to an
outside criminal organization. Secondly, we deduce that each employee has
access to each computer in the embassy and each computer is assigned to only
one employee and should only be used only by this employee. For example, only
the employee no. 10 utilizes the computer with the IP 37.170.100.10. Therefore,
we defined our first hypothesis: “There should be no traffic on a personal
computer during the absence of the defined user.”
Phase: Refine II
We apply this
hypothesis to define our second rule violation “unattended network access”. It
is represented by an orange corona in the visual model (cp. Figure 3).
Figure 3: Visual Model (v0.8), high
resolution here available.
Phase: Reasoning III
After applying
this rule violation on our data set, we reveal eight events of activity while
its defined user is in the restricted area. The timelines of employees nos. 15,
16, 31, 41, 52 and 56 show this rule violation.
Based on our
findings, we check the target IP address to which the data was sent. During
unintended network access this target address is always used. It is the IP
address 100.59.151.133. We define our second hypothesis: “The revealed IP
address belongs to the criminal organization and every data transfer to this
address is unauthorized.”
Phase: Refine III
We define our
third rule violation “access to suspicious IP”. It is represented by a pink corona.
A new version of our visual model is implemented (v0.9) showing all the defined
rule violations (cp. Figure 4).
Figure 4: Visual Model (v0.9), high
resolution here available.
Phase: Reasoning IV
Activities on
computers of eight further employees apply to this rule violation. These are
employees nos. 8, 10, 13, 16, 18, 20, 31 and 32. The defined rule violations
disclose 13 employees with unusual behavior. We therefore regard them as
suspects.
At the present
time, we are not able to narrow the number down to one person. A further
adaption of our visual model is necessary. Therefore, we define the third
hypothesis: “Staying in the restricted area or using the defined user’s
computer can be seen as an alibi which excludes these employees from being the
suspect.”
Phase: Refine IV
We define two
restrictions on the visualization: firstly, remove all users staying in the
restricted area while an unauthorized data transfer happens, and secondly,
remove all users using their defined computer at the very moment the
unauthorized data transfer happens. For the last restriction, a time lag of two
seconds is permitted between the unauthorized data transfer and the usage of
the defined user’s computer. Consequently, an employee is not the target person
if he used his computer within two seconds before and after the unauthorized
data transfer.
The final
visual model contains all employees where the second and third rule violation
applies. Semitransparent green boxes highlight the existing alibis of users
staying in the restricted area while an unauthorized data transfer takes place
(cp. Figure 5).
Phase: Reasoning V
The defined
restrictions reduce the number of displayed employees and their visualized
activities. Now the target person can be identified very easily. Based on the
defined rule violations and restrictions, we detect employee no. 48 as the
suspect. A semitransparent white box highlights this employee.
Figure 5: Visual Model (v1.0), high
resolution here available.
Phase: Interact
Methods are
added to the final visual model to manipulate all available data more
conveniently and to allow the user to control the visualized employee list.
During the analytical reasoning process, we realized that first of all an
overview of the data is needed. The user should then be able to zoom and filter
the data and details on demand. We therefore added further functions to the
visualization, which are shown in the video contribution of our submission.